Skip to content

Conversation

@hellkite500
Copy link
Contributor

@hellkite500 hellkite500 commented Oct 27, 2025

Pulled from #909 into its own PR in order to better address some address sanitizer failures on the updated macos-15 runners. I'll rebase once that PR lands.

@PhilMiller
Copy link
Contributor

Looks like you've still got all the mass balance changes in here. Rebase?

@hellkite500
Copy link
Contributor Author

Looks like you've still got all the mass balance changes in here. Rebase?

That's the plan, should be updated shortly.

@robertbartel
Copy link
Contributor

I've tried to help with figuring out why that last MacOS 15 failure is happening (an AddressSanitizer: global-buffer-overflow related to the test_bmi_c test shared library's Get_var_type function and the output_var_names variable), but I haven't had any more luck. I'll add some comments, though it's likely @hellkite500 and @aaraney already arrived at these same conclusions.

Expand here to see the actual AddressSanitizer global-buffer-overflow error message
==7378==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000101300160 at pc 0x0001012faf44 bp 0x00016f092ec0 sp 0x00016f092eb8
READ of size 8 at 0x000101300160 thread T0
    #0 0x0001012faf40 in Get_var_type bmi_test_bmi_c.c:532
    #1 0x0001012fb078 in Get_var_itemsize bmi_test_bmi_c.c:428
    #2 0x000100eda058 in models::bmi::Bmi_C_Adapter::GetVarItemsize(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) Bmi_C_Adapter.cpp:195
    #3 0x000100ebe008 in realization::Bmi_Module_Formulation::set_initial_bmi_parameters(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>) Bmi_Module_Formulation.cpp:488
    #4 0x000100eb2c8c in realization::Bmi_Module_Formulation::inner_create_formulation(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>, bool) Bmi_Module_Formulation.cpp:334
    #5 0x000100eb0b64 in realization::Bmi_Module_Formulation::create_formulation(boost::property_tree::basic_ptree<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>*) Bmi_Module_Formulation.cpp:8
    #6 0x000100d989c4 in Bmi_C_Formulation_Test_Initialize_0_a_Test::TestBody() Bmi_C_Formulation_Test.cpp:232
    #7 0x000100e526c4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2635
    #8 0x000100e523cc in testing::Test::Run() gtest.cc:2674
    #9 0x000100e54b84 in testing::TestInfo::Run() gtest.cc:2853
    #10 0x000100e569ec in testing::TestSuite::Run() gtest.cc:3012
    #11 0x000100e79f2c in testing::internal::UnitTestImpl::RunAllTests() gtest.cc:5870
    #12 0x000100e790e4 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) gtest.cc:2635
    #13 0x000100e78e84 in testing::UnitTest::Run() gtest.cc:5444
    #14 0x000100e9e974 in main gtest_main.cc:51
    #15 0x0001852beb94  (<unknown module>)

0x000101300160 is located 32 bytes before global variable 'output_var_types' defined in '/Users/runner/work/ngen/ngen/extern/test_bmi_c/src/bmi_test_bmi_c.c' (0x000101300180) of size 16
0x000101300160 is located 0 bytes inside of global variable 'output_var_names' defined in '/Users/runner/work/ngen/ngen/extern/test_bmi_c/src/bmi_test_bmi_c.c' (0x000101300160) of size 16
0x000101300160 is located 10016 bytes inside of global variable 'm' defined in '<null>' (0x0001012fda40) of size 4314880576
0x000101300160 is located 10464 bytes inside of global variable 'double' defined in '<null>' (0x0001012fd880) of size 4314880128
SUMMARY: AddressSanitizer: global-buffer-overflow bmi_test_bmi_c.c:532 in Get_var_type
Shadow bytes around the buggy address:
  0x0001012ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0001012fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0001012fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000101300000: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000101300080: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
=>0x000101300100: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9[f9]f9 f9 f9
  0x000101300180: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000101300200: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000101300280: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000101300300: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000101300380: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==7378==ABORTING

I haven't been able to reproduce the problem locally either. I will note that it looks like the MacOS 15 runners use LLVM 18, while I've been testing with clang 16. I'm curious what specific versions @hellkite500 and @aaraney have been using.

I did confirm that, if the main project CMake build is set up to link AddressSanitizer into the project targets (e.g., the testing executable), AddressSanitizer will also be linked into the test_bmi_c shared library (i.e., when the latter gets built as a project dependency of the former).

If I had to guess at this point, the problem has something to do with the how AddressSanitizer handles memory for global items. I've found a few things (e.g., here and here) suggesting that kind of connection. I started to examine the LLVM codebase to see if I can figure out what's going on, but after some initial looking around, that's clearly going to require a much larger time investment. Maybe something that should be done eventually, but not right now.

My suggestion as an immediate solution is to suppress AddressSanitizer errors from the test_bmi_c testing shared library, as discussed here.

@PhilMiller
Copy link
Contributor

In my experience, ASan is almost certainly correct that there is an error, and that it is occurring during execution of the statement described by the stack trace. Thus, it's probably inappropriate to suppress it, and digging into the ASan implementation is likely a misguided rabbit hole.

I'd recommend focusing on trying to find out the name of the BMI variable being queried for purposes of setting, such as by printing it to standard error like ASan writes its notification. Alternately, just audit all of the variables that you expect to go through that code path, and make sure the BMI metadata functions and the tables they're relying on are correct for them.

@hellkite500
Copy link
Contributor Author

In my experience, ASan is almost certainly correct that there is an error, and that it is occurring during execution of the statement described by the stack trace. Thus, it's probably inappropriate to suppress it, and digging into the ASan implementation is likely a misguided rabbit hole.

I'd recommend focusing on trying to find out the name of the BMI variable being queried for purposes of setting, such as by printing it to standard error like ASan writes its notification. Alternately, just audit all of the variables that you expect to go through that code path, and make sure the BMI metadata functions and the tables they're relying on are correct for them.

I've been going through these steps as time permits and trying to understand what could be happening here. It is somewhat odd that the gcc sanitizer doesn't see this issue, and the ASan info produced in this given error is a bit puzzling as I trace through the test code and the code paths to the offending line.

I'll try to write up my thoughts along the way here.

@PhilMiller
Copy link
Contributor

PhilMiller commented Oct 29, 2025

So, the actual ASan code is shared between GCC and LLVM. The runtime library they use is imported from the same repository, and they generate what's supposed to be functionally identical instrumentation. If there's an error reported with one, but not the other, or between the same compiler on different platforms, or different versions of the same compiler, that would typically indicate some sort of ill-defined behavior of the underlying code being tested. Most typically, an uninitialized variable being used to index into some data structure, whose observed contents differ by the disparate circumstances.

@robertbartel
Copy link
Contributor

In my experience, ASan is almost certainly correct that there is an error,

That's been my impression. What puzzles me here is both that it does not indicate an error in all environments/versions/scenarios and it is not at all apparent why the involved code would ever overflow the buffer.

I'd recommend focusing on trying to find out the name of the BMI variable being queried for purposes of setting

It's not actually a BMI variable, but a static global array that holds BMI variable metadata. For reference:

static const char *output_var_types[OUTPUT_VAR_NAME_COUNT] = { "double", "double" };

The later code being flagged uses that same defined constant - OUTPUT_VAR_NAME_COUNT - for it's loop:

static int Get_var_type (Bmi *self, const char *name, char * type)
{
    size_t i;
    // Check to see if in output array first
    for (i = 0; i < OUTPUT_VAR_NAME_COUNT; i++) {
        if (strcmp(name, output_var_names[i]) == 0) {      // This is where ASan gets mad

Also, if I'm interpreting this correctly (but I could use a sanity check), the overflow is actually being flagged when reading the first item of the array:

==7378==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000101300160 at pc 0x0001012faf44 bp 0x00016f092ec0 sp 0x00016f092eb8
READ of size 8 at 0x000101300160 thread T0

...

0x000101300160 is located 0 bytes inside of global variable 'output_var_names' defined in '/Users/runner/work/ngen/ngen/extern/test_bmi_c/src/bmi_test_bmi_c.c' (0x000101300160) of size 16

@PhilMiller
Copy link
Contributor

OK, looking at this more closely, it starts to look like some sort of defect in ASan, or something managing to corrupt its shadow memory, which would involve truly impressive stack corruption.

As it tells you, the offending memory access on address 0x000101300160 is exactly the first entry of two in output_var_names

0x000101300160 is located 0 bytes inside of global variable 'output_var_names' defined in '/Users/runner/work/ngen/ngen/extern/test_bmi_c/src/bmi_test_bmi_c.c' (0x000101300160) of size 16

The shadow memory values covering output_var_names and output_var_types and a lot of space around them are all f9, "Global redzone".

My intuition here is that something is going wrong with either a mis-matched build between the ngen binary and the test_bmi_c dynamic library, or with how we're using dlopen and its interaction with ASan.

I'm going to take a look at the build logs to see if anything jumps out at me.

@PhilMiller
Copy link
Contributor

Per one of your links, did you try adding -asan-globals=0 to see if the same error is still reported?

@PhilMiller
Copy link
Contributor

With what I've seen so far, I'm willing to entertain the notion that ASan is actually broken here. I've seen it simply fail to load a program before in weird enough circumstances, but this is the first time I've seen an apparent false positive error report.

In grad school and previous jobs, I would have been able to relish the opportunity to dig in to the root cause, report it, and maybe actually fix this. Alas, not what we're paid for here.

@PhilMiller
Copy link
Contributor

PhilMiller commented Oct 29, 2025

Besides, and maybe before -asan-globals=0, could you (at least temporarily) make the build in the CI job VERBOSE, so we can see the full compiler command lines as they were run?

https://groups.google.com/g/address-sanitizer/c/XJXOrSvN8vg?pli=1 makes me mildly concerned that maybe the test binary has not been compiled and/or linked with the sanitizer flags. Maybe CFLAGS is fine, but we need to set LDFLAGS as well? This quirk may be platform/version-specific, which would explain why it's only seen on this target and not others.

Google's AI overview (ugh) also tipped me that there may be an initialization order issue in play here. I'll note that this error seems to be occurring on the very first library-global-variable memory access in the very first code called in the dynamically-loaded BMI module. That would also potentially point to a mis-matched compilation/linkage concern.

Google's AI Overview as it appeared for me in response to the query string "addresssanitizer dlopen global false positive" An AddressSanitizer (ASan) false positive involving a dlopen call on a global variable is rare but most often caused by a dynamic initialization order issue or an uninstrumented library. Common causes and solutions 1. Dynamic Initialization Order Fiasco (C++) When a shared library is loaded with dlopen, its global variables are dynamically initialized. A false positive can occur if your main executable's global variables access a global from the shared library before the library has fully initialized, or after it has been unloaded via dlclose.
Symptom: The ASan report will indicate a memory error related to a global variable during or immediately after a dlopen or dlclose call.
Solution:
    Enable ASan's initialization-order-fiasco checker by setting the environment variable ASAN_OPTIONS=check_initialization_order=1 (not supported on macOS). This can help pinpoint the exact initialization-order problem.
    Restructure your code to avoid accessing globals in different translation units from within a global constructor. If possible, initialize complex globals lazily.
    Use an ASan ignore list to suppress known, benign initialization-order violations. 
  1. Mixing ASan and non-ASan libraries
    If a shared library is not built with ASan, but your main program is, issues can arise. The ASan runtime may not understand the memory layout of the uninstrumented library, or how it allocates and deallocates memory, leading to false positives.

    Symptom: ASan might report a use-after-free or other memory error related to memory managed by the non-ASan library, especially after it's loaded or unloaded.
    Solution:
    Instrument the shared library with ASan as well. This is the most reliable solution.
    If you cannot rebuild the library, avoid performing any operations that might confuse the ASan runtime across the library boundary.
    Use a suppression file to ignore specific, known-benign reports from the uninstrumented library.

  2. Overlapping global variable redzones (Visual Studio)
    On Visual Studio, an ASan bug has been reported where the redzones for two adjacent global variables can overlap, triggering a spurious global-buffer-overflow error. This is a known issue, and can be triggered during initialization by the C runtime library.

    Symptom: ASan reports a global-buffer-overflow very early in program execution, potentially before main is called.
    Solution:
    If you are using Visual Studio, ensure you are using the latest version with the most recent ASan updates.
    If you can't update, add padding or reorder your globals to prevent the compiler from placing them contiguously.

  3. dlclose and memory reuse
    If a shared library is unloaded via dlclose, but ASan's shadow memory isn't properly cleared, future memory allocations could occupy the same address space. ASan could then mistakenly report an error on this new memory because it still believes the old global variables are in place.

    Symptom: An ASan error occurs in memory that was recently freed and re-allocated after a dlclose call.
    Solution: The ASan runtime should handle this correctly in most cases. If you suspect an issue, investigate the specific ASan version and platform, as this can indicate a bug in the sanitizer itself.

How to debug

Isolate the issue. Try to create a minimal, reproducible example that demonstrates the false positive.
Check the ASan log. The report contains a stack trace that can point to the code responsible.
Disable checks selectively. Temporarily disable the check_initialization_order option or other specific ASan checks to see if the false positive disappears.
Consider a suppression file. Create a file with suppression rules to ignore the specific ASan error. This should be a last resort after confirming it's a genuine false positive. 

@robertbartel
Copy link
Contributor

https://groups.google.com/g/address-sanitizer/c/XJXOrSvN8vg?pli=1 makes me mildly concerned that maybe the test binary has not been compiled and/or linked with the sanitizer flags

While not definitive analysis, I did confirm in my own development environment that if the CFLAGS and CXXFLAGS are set for using ASan, both the test executable (test_bmi_c) and the test BMI module shared library use @rpath/libclang_rt.asan_osx_dynamic.dylib and have plenty of __asan_ stuff in their symbol tables. I suppose it's still possible something subtle in the runner environment is causing this not to work exactly the same, though I would have expected the test binary to be fine and the test BMI module to not be compiled/linked properly.

I'll note that this error seems to be occurring on the very first library-global-variable memory access in the very first code called in the dynamically-loaded BMI module. That would also potentially point to a mis-matched compilation/linkage concern.

It is ... and it isn't.

The ASan error happens in first test of the Bmi_C_Formulation_Test class. I didn't notice or think about this earlier, but prior to those BMI C formulation tests, all the Bmi_C_Adapter_Test tests are run and pass. And Bmi_C_Adapter_Test does exercise the module's Get_var_type function (at minimum, in Bmi_C_Adapter_Test.GetVarType_0_a).

Curiously, the problem test actually has the ngen adapter class calling the test BMI module's Get_var_itemsize function, which in turn calls Get_var_type, where the potential buffer overflow code happens. Bmi_C_Adapter_Test does not appear to exercise Get_var_itemsize. On the surface, I wouldn't expect that to matter, but we're getting into some gnarly territory here.

@robertbartel
Copy link
Contributor

Trying to get back to this, after falling down a rabbit hole and then having to step away from it ... unfortunately things are even less clear.

I've made a copy of this PR branch in my fork, preparing to try and test the -asan-globals=0 setting. I made only one small (temporary) tweak in my copy, to .github/test_and_validate.yml, just to have the GitHub Actions run when I to push it.

Curiously, the initial Action run following this did not reproduce the ASan error discussed above (see here for that original error). There were other errors though - this related to numpy and this different ASan error related to the Python integration. Both still only happen in the Mac runner.

At first glance, I'm especially confused by the numpy one, both because it happened in the test_bmi_cpp runner job and because it didn't happen in the test_bmi_python runner job.

For the new ASan error, I noticed the involved test_bmi_multi runner job sets some ASAN_OPTIONS (any context here that may be relevant @hellkite500?), recently added but also not exclusive to that job.

And of course, neither of those were happening previously, at least not every time. Plus, what happened to the other ASan error?

I'm going to try re-running the running in my fork to see what happens.

@robertbartel
Copy link
Contributor

Well, the same error happens in test_bmi_multi: a global-buffer-overflow related to the Python. I'm not sure this isn't a similar ASan error to the earlier C BMI test module one, just with some things obscured because of the nature of the Python integration.

The previous error in test_bmi_cpp related to numpy didn't happen, but another ASan error did. This definitely looks similar to the previously discussed ASan error from the C test BMI module, except this time with the C++ test module and the vtable for TestBmiCpp global variable. That makes it less easy to see if/why a buffer overflow is happening, but it tracks with the notion of the ngen's dynamic BMI module loading setup being involved, either because it confuses ASan or it is causing a subtle issue with ASan (sometimes) catches.

@robertbartel
Copy link
Contributor

robertbartel commented Nov 18, 2025

Ok, I've managed to find some changes that seem to address at least some of issues. These are In my fork, in the ci-updates branch (see links below). @hellkite500, you may want to cherry-pick these into the PR branch.

First, commit ddf4de3 sets the -fno-common flag. My interpretation of the documentation here is that this prevents ASan from treating C global variables as common variables, and thus doesn't allow it to instrument them. I think this makes sense to apply based on what we've seen: things related to globals in dynamically imported shared libraries, where it looks like ASan is doing something odd with them.

Next, commit eebb25e applies detect_odr_violation=0 - set in some other jobs already - in the test_pet runner job. Occasionally, ODR errors happen in that job.

All that said, sometimes the test_bmi_multi global-buffer-overflow happens, and sometimes it doesn't. It seems like the -fsanitize-ignorelist=/tmp/asan_ignore.txt isn't being applied, as that looks like it contains the configuration needed to suppress errors like this. I've tried switching it to -fsanitize-blacklist (per this again), and applying the suppression file using the ASAN_OPTIONS=suppression=... syntax described here, but those didn't help. At this point, I suspect the suppressions aren't going to apply to the involved code, as that only works on code not compiled with AddressSanitizer.

@PhilMiller
Copy link
Contributor

Ooh, could you paste in the text of one of those ODR errors? Regardless of all the other issues, those can cause lots of trouble in really weird ways, so it would be good to try to nail that down.

@robertbartel
Copy link
Contributor

Sure ...

Click to expand example of ODR error in _test_pet_ runner job
Running main() from /Users/runner/work/ngen/ngen/test/googletest/googletest/src/gtest_main.cc
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from Bmi_C_Pet_IT
[ RUN      ] Bmi_C_Pet_IT.Test_InitModel
Error(Integration)::mass_balance: Error getting mass balance values for module 'Potential Evapotranspiration': BMI C model failed to get values for variable ngen::mass_in.

[       OK ] Bmi_C_Pet_IT.Test_InitModel (14 ms)
[ RUN      ] Bmi_C_Pet_IT.Test_GetResponse
=================================================================
==6567==ERROR: AddressSanitizer: odr-violation (0x000102a9eb20):
  [1] size=4 'pet_method_int' /Users/runner/work/ngen/ngen/extern/evapotranspiration/evapotranspiration/src/pet.c in libpetbmi.1.0.0.dylib
  [2] size=4 'pet_method_int' /Users/runner/work/ngen/ngen/extern/evapotranspiration/evapotranspiration/src/pet.c in libpetbmi.1.0.0.dylib
These globals were registered at these points:
  [1]:
    #0 0x000102b56a54 in __asan_register_globals+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x12a54)
    #1 0x000102b7fb84 in __asan::AsanApplyToGlobals(void (*)(__asan_global*, unsigned long), void const*)+0x7c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3bb84)
    #2 0x000102b5699c in __asan_register_image_globals+0x34 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x1299c)
    #3 0x000102a8dc60 in asan.module_ctor+0x18 (libpetbmi.1.0.0.dylib:arm64+0x5c60)
    #4 0x000199deeef8 in invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const+0x1b8 (dyld:arm64e+0xfffffffffff56ef8)
    #5 0x000199e2b898 in invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const+0x140 (dyld:arm64e+0xfffffffffff93898)
    #6 0x000199e4b5c8 in invocation function for block in mach_o::Header::forEachSection(void (mach_o::Header::SectionInfo const&, bool&) block_pointer) const+0xec (dyld:arm64e+0xfffffffffffb35c8)
    #7 0x000199e48354 in mach_o::Header::forEachLoadCommand(void (load_command const*, bool&) block_pointer) const+0xcc (dyld:arm64e+0xfffffffffffb0354)
    #8 0x000199e49a94 in mach_o::Header::forEachSection(void (mach_o::Header::SectionInfo const&, bool&) block_pointer) const+0x78 (dyld:arm64e+0xfffffffffffb1a94)
    #9 0x000199e2b368 in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const+0x200 (dyld:arm64e+0xfffffffffff93368)
    #10 0x000199deecb0 in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const+0xac (dyld:arm64e+0xfffffffffff56cb0)
    #11 0x000199df666c in dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const+0x20 (dyld:arm64e+0xfffffffffff5e66c)
    #12 0x000199def45c in dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&, dyld3::Array<dyld4::Loader const*>&) const+0x130 (dyld:arm64e+0xfffffffffff5745c)
    #13 0x000199df3bec in dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_0::operator()() const+0xb0 (dyld:arm64e+0xfffffffffff5bbec)
    #14 0x000199def778 in dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const+0x2c8 (dyld:arm64e+0xfffffffffff57778)
    #15 0x000199e13140 in dyld4::APIs::dlopen_from(char const*, int, void*)::$_0::operator()() const+0x73c (dyld:arm64e+0xfffffffffff7b140)
    #16 0x000199e07f98 in dyld4::APIs::dlopen_from(char const*, int, void*)+0x46c (dyld:arm64e+0xfffffffffff6ff98)
    #17 0x000199e07a7c in dyld4::APIs::dlopen(char const*, int)+0x7c (dyld:arm64e+0xfffffffffff6fa7c)
    #18 0x000102b719b0 in dlopen+0x1bc (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x2d9b0)
    #19 0x000102151300 in models::bmi::AbstractCLibBmiAdapter::dynamic_library_load() AbstractCLibBmiAdapter.cpp:86
    #20 0x00010215de8c in models::bmi::Bmi_C_Adapter::execModuleRegistration() Bmi_C_Adapter.hpp:466
    #21 0x0001021541d0 in models::bmi::Bmi_C_Adapter::construct_and_init_backing_model_for_type() Bmi_C_Adapter.hpp:556
    #22 0x000102153bd0 in models::bmi::Bmi_C_Adapter::Bmi_C_Adapter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool) Bmi_C_Adapter.cpp:64
    #23 0x000102153890 in models::bmi::Bmi_C_Adapter::Bmi_C_Adapter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) Bmi_C_Adapter.cpp:34
    #24 0x000102120b84 in void std::__1::allocator<models::bmi::Bmi_C_Adapter>::construct[abi:ne190102]<models::bmi::Bmi_C_Adapter, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&>(models::bmi::Bmi_C_Adapter*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) allocator.h:165
    #25 0x000102118c88 in realization::Bmi_C_Formulation::construct_model(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>> const&) Bmi_C_Formulation.cpp:27
    #26 0x000102126ad8 in realization::Bmi_Module_Formulation::inner_create_formulation(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>, bool) Bmi_Module_Formulation.cpp:330
    #27 0x000102124b2c in realization::Bmi_Module_Formulation::create_formulation(boost::property_tree::basic_ptree<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>*) Bmi_Module_Formulation.cpp:8
    #28 0x00010204d20c in Bmi_C_Pet_IT_Test_InitModel_Test::TestBody() Bmi_C_Pet_IT.cpp:172
    #29 0x0001020c3f40 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2635

  [2]:
    #0 0x000102b56a54 in __asan_register_globals+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x12a54)
    #1 0x000102b7fb84 in __asan::AsanApplyToGlobals(void (*)(__asan_global*, unsigned long), void const*)+0x7c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3bb84)
    #2 0x000102b5699c in __asan_register_image_globals+0x34 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x1299c)
    #3 0x000102a8dc60 in asan.module_ctor+0x18 (libpetbmi.1.0.0.dylib:arm64+0x5c60)
    #4 0x000199deeef8 in invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const+0x1b8 (dyld:arm64e+0xfffffffffff56ef8)
    #5 0x000199e2b898 in invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const+0x140 (dyld:arm64e+0xfffffffffff93898)
    #6 0x000199e4b5c8 in invocation function for block in mach_o::Header::forEachSection(void (mach_o::Header::SectionInfo const&, bool&) block_pointer) const+0xec (dyld:arm64e+0xfffffffffffb35c8)
    #7 0x000199e48354 in mach_o::Header::forEachLoadCommand(void (load_command const*, bool&) block_pointer) const+0xcc (dyld:arm64e+0xfffffffffffb0354)
    #8 0x000199e49a94 in mach_o::Header::forEachSection(void (mach_o::Header::SectionInfo const&, bool&) block_pointer) const+0x78 (dyld:arm64e+0xfffffffffffb1a94)
    #9 0x000199e2b368 in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const+0x200 (dyld:arm64e+0xfffffffffff93368)
    #10 0x000199deecb0 in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const+0xac (dyld:arm64e+0xfffffffffff56cb0)
    #11 0x000199df666c in dyld4::JustInTimeLoader::runInitializers(dyld4::RuntimeState&) const+0x20 (dyld:arm64e+0xfffffffffff5e66c)
    #12 0x000199def45c in dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&, dyld3::Array<dyld4::Loader const*>&) const+0x130 (dyld:arm64e+0xfffffffffff5745c)
    #13 0x000199df3bec in dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const::$_0::operator()() const+0xb0 (dyld:arm64e+0xfffffffffff5bbec)
    #14 0x000199def778 in dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const+0x2c8 (dyld:arm64e+0xfffffffffff57778)
    #15 0x000199e13140 in dyld4::APIs::dlopen_from(char const*, int, void*)::$_0::operator()() const+0x73c (dyld:arm64e+0xfffffffffff7b140)
    #16 0x000199e07f98 in dyld4::APIs::dlopen_from(char const*, int, void*)+0x46c (dyld:arm64e+0xfffffffffff6ff98)
    #17 0x000199e07a7c in dyld4::APIs::dlopen(char const*, int)+0x7c (dyld:arm64e+0xfffffffffff6fa7c)
    #18 0x000102b719b0 in dlopen+0x1bc (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x2d9b0)
    #19 0x000102151300 in models::bmi::AbstractCLibBmiAdapter::dynamic_library_load() AbstractCLibBmiAdapter.cpp:86
    #20 0x00010215de8c in models::bmi::Bmi_C_Adapter::execModuleRegistration() Bmi_C_Adapter.hpp:466
    #21 0x0001021541d0 in models::bmi::Bmi_C_Adapter::construct_and_init_backing_model_for_type() Bmi_C_Adapter.hpp:556
    #22 0x000102153bd0 in models::bmi::Bmi_C_Adapter::Bmi_C_Adapter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool) Bmi_C_Adapter.cpp:64
    #23 0x000102153890 in models::bmi::Bmi_C_Adapter::Bmi_C_Adapter(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) Bmi_C_Adapter.cpp:34
    #24 0x000102120b84 in void std::__1::allocator<models::bmi::Bmi_C_Adapter>::construct[abi:ne190102]<models::bmi::Bmi_C_Adapter, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&>(models::bmi::Bmi_C_Adapter*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) allocator.h:165
    #25 0x000102118c88 in realization::Bmi_C_Formulation::construct_model(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>> const&) Bmi_C_Formulation.cpp:27
    #26 0x000102126ad8 in realization::Bmi_Module_Formulation::inner_create_formulation(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>, bool) Bmi_Module_Formulation.cpp:330
    #27 0x000102124b2c in realization::Bmi_Module_Formulation::create_formulation(boost::property_tree::basic_ptree<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, geojson::JSONProperty, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, geojson::JSONProperty>>>*) Bmi_Module_Formulation.cpp:8
    #28 0x00010204d20c in Bmi_C_Pet_IT_Test_InitModel_Test::TestBody() Bmi_C_Pet_IT.cpp:172
    #29 0x0001020c3f40 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2635

==6567==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'pet_method_int' at /Users/runner/work/ngen/ngen/extern/evapotranspiration/evapotranspiration/src/pet.c in libpetbmi.1.0.0.dylib
==6567==ABORTING
/Users/runner/work/_temp/3bc8ad1e-8735-4e77-9e5f-8132849d396a.sh: line 1:  6567 Abort trap: 6           ./cmake_build/test/compare_pet
Error: Process completed with exit code 134.

@robertbartel
Copy link
Contributor

Ok, I've managed to find some changes that seem to address at least some of issues.

I take it back ... whether the error occurs does not appear to be directly related to the changes.

Consider these Action runs:

  1. Run 276 (commit 9bbb9a9)
    i. Passes everything
  2. Run 279 (commit 24cbe26)
    i. Fails just test_bmi_multi
  3. Run 280 (commit 991d100
    i. Fails test_bmi_multi, and now the earlier issue in test_bmi_c

Commits from 1 and 3 are no different, and changes between 1/3 and 2 don't seem to be meaningfully different in a way that could cause problems.

@robertbartel
Copy link
Contributor

robertbartel commented Nov 19, 2025

Ok, I think I may have at least figured out how to reproduce some of these errors locally. CMake will default to applying the -O0 flag for debug builds (which is all I was using before), but if I deliberately set -O1 (as the Action step is configured to do for builds), I do occasionally see all the previously discussed ASan errors (and perhaps another global buffer overflow ASan error with test_bmi_multi sometimes too, different from the "wild pointer" one).

I also ran some experiments with test_bmi_c and the original buffer overflow that was discussed. To be clear: I compiled things only once, and ran the same executable repeatedly. In 1500 attempts running cmake_build/test/test_bmi_c, 1335 succeeded, while 165 failed. I did not capture output for all runs to confirm definitively, but in all runs for which I did capture output, the error was that same global buffer overflow related to output_var_types.

@robertbartel
Copy link
Contributor

I ran another similar set of experiments in my Mac dev environment, compiling with -O0 and running test_bmi_c and test_bmi_cpp 1000 times each. In both cases, there were no ASan errors.

At this point, I suggest we adjust the Actions so that -O0 is used for Mac runners. We also open up a separate issue to investigate these ASan errors further, though not necessarily right now.

As far as I can tell, -O1 or higher is recommended for use with ASan because -O0 is slow, but using the latter should be perfectly valid. I've not seen anything suggesting -O1 is better for error detection because, e.g., with -O1 you'll catch errors that you wouldn't with -O0 (if anything, it seems the reverse would be more likely). Obviously something subtle and contrary to expectations is going on here, which eventually someone will need to figured out, but I think we should prioritize getting the CI checks working.

@hellkite500, @aaraney, @PhilMiller: thoughts? We can assemble and discuss in more detail if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants